Pig (programming language)

Pig ^[1] is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin^[1]. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java and then call directly from the language.

Pig was originally ^[2] developed at Yahoo Research around 2006 for researchers to have an ad-hoc way of creating and executing map-reduce jobs on very large data sets. In 2007^[3], it was moved into the Apache Software Foundation.^[4]

Below is an example of a "Word Count" program in Pig Latin

A = load '/tmp/my-copy-of-all-pages-on-internet';
B = foreach A generate flatten(TOKENIZE((chararray)$0)) as word;
C = filter B by word matches '\\w+';
D = group C by word;
E = foreach D generate COUNT(C) as count, group as word;
F = order E by count desc;
store F into '/tmp/number-of-words-on-internet';

The above program will generate parallel executable tasks which can be distributed across 1,000s of machines in a Hadoop cluster to count the number of words in a dataset such as "all the webpages on the internet".

References

^ ^a ^b "Hadoop: Apache Pig". http://pig.apache.org/. Retrieved Sep 2, 2011.
^ "Yahoo Blog:Pig – The Road to an Efficient High-level language for Hadoop". http://developer.yahoo.com/blogs/hadoop/posts/2008/10/pig_-_the_road_to_an_efficient_high-level_language_for_hadoop/. Retrieved Nov 1, 2010.
^ "Pig into Incubation at the Apache Software Foundation". http://developer.yahoo.com/blogs/hadoop/posts/2007/11/pig_into_incubation/. Retrieved Nov 1, 2010.
^ "The Apache Software Foundation". http://apache.org/. Retrieved Nov 1, 2010.

External links

Official site

Apache Software Foundation

Top level projects	Abdera ActiveMQ Ant Aries Apache HTTP Server APR Avro Axis Buildr Camel Cassandra Cayenne Chemistry Click Cocoon Continuum CouchDB CXF Derby Directory Felix Forrest Geronimo Gump Hadoop Hive HBase Jackrabbit James Karaf Lenya libcloud Mahout Maven MINA mod_perl MyFaces ODE OFBiz OpenEJB OpenJPA POI Pivot Qpid River Roller ServiceMix Shindig Shiro Sling SpamAssassin stdcxx Struts Subversion Tapestry Thrift Tomcat Trafficserver Tuscany UIMA Velocity Wicket Xerces XMLBeans

Jakarta Projects	BCEL BSF Cactus JMeter

Commons Projects	Daemon Sanselan Jelly

Lucene Projects	Lucene Java Droids Lucene.Net Lucy Nutch Open Relevance Project PyLucene Solr Tika

Hadoop Projects	HDFS ZooKeeper

Other projects	Chainsaw Batik FOP Log4j XAP Log4Net Ivy Wink

Incubator Projects	ACE Callback Composer Empire-db Hama JSPWiki OpenOffice.org XAP Wave Wink

Apache Attic	AxKit Beehive Bluesky Excalibur Harmony HiveMind Slide Shale iBATIS

License: Apache License Website: apache.org

Pig (programming language)

See also

References

External links